9 research outputs found

    Rumble: Data Independence for Large Messy Data Sets

    Full text link
    This paper introduces Rumble, an engine that executes JSONiq queries on large, heterogeneous and nested collections of JSON objects, leveraging the parallel capabilities of Spark so as to provide a high degree of data independence. The design is based on two key insights: (i) how to map JSONiq expressions to Spark transformations on RDDs and (ii) how to map JSONiq FLWOR clauses to Spark SQL on DataFrames. We have developed a working implementation of these mappings showing that JSONiq can efficiently run on Spark to query billions of objects into, at least, the TB range. The JSONiq code is concise in comparison to Spark's host languages while seamlessly supporting the nested, heterogeneous data sets that Spark SQL does not. The ability to process this kind of input, commonly found, is paramount for data cleaning and curation. The experimental analysis indicates that there is no excessive performance loss, occasionally even a gain, over Spark SQL for structured data, and a performance gain over PySpark. This demonstrates that a language such as JSONiq is a simple and viable approach to large-scale querying of denormalized, heterogeneous, arborescent data sets, in the same way as SQL can be leveraged for structured data sets. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.Comment: Preprint, 9 page

    JSONiq on Spark

    No full text

    Absolute position measurement and control for Wärtsilä engine during slow turning

    Get PDF
    The thesis work was carried out for Wärtsilä Finland Oy’s Engine Performance and Control department of the Marine Solutions division. The paper studied the implementation of the absolute position measurement and control of Wärtsilä’s engine crank angle during the slow turning procedure. The topic came out as an endeavor to improve the functionality of Wärtsilä’s engine control system, as at the moment of its proposal the crank angle had remained unknown during the entire slow turning sequence. The aim of the thesis was to implement a solution that would facilitate the precise adjustment of the engine position after the slow turning had proceeded. Slow turning is a marine engine procedure during which the engine is being rotated at low speeds, in order to check the presence of water in the cylinders, as this can lead to hydrostatic locking, and thus cause severe damages during normal, high-speed operation. Given the fact that the tachometers used by the Wärtsilä engine systems at the outset of this study were of inductive nature, it was not possible to monitor rotational velocities below 100 rotations per minute. This meant that not only the engine stopped at an unknown position after slow turning, but also that, in the case that maintenance needed to be effectuated, manual engine positioning was required. Research was conducted with the aim of investigating the current system, and finding its limitations. In addition, the necessary work for implementing, testing and documenting an improved alternative was performed

    Rumble: Data Independence for Large Messy Data Sets

    No full text
    This paper introduces Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine, Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.ISSN:2150-809

    Populatii vulnerabile si fenomene de automarginalizare. Strategii de interventie si efecte perverse/ Vulnerable People and Self-Marginalisation. Intervention Strategies (Romanian Version)

    No full text
    Social or community solidarity has several meanings emerged with the evolution of society and the stabilization of values and standards colective. First of all, it is the common meaning of solidarity, expressed overall by the feeling of belonging to a group and realization of the debt on defending common causes with other members, to help the needs and aspirations with them to promote values. Secondly it is the legal meaning, normative, according to which it is conceived as a mechanism to assume the obligations by someone in respect of another person to take charge of protection activities for those who need help at a specific moment, when they are victims of social and individual risks. No. pg. 58Social or community solidarity, mechanism to assume the obligations, Self-Marginalisation

    MODIS-based multi-parametric platform for mapping of flood affected areas. Case study: 2006 Danube extreme flood in Romania

    No full text
    Flooding remains the most widely distributed natural hazard in Europe, leading to significant economic and social impact. Earth observation data is presently capable of making fundamental contributions towards reducing the detrimental effects of extreme floods. Technological advance makes development of online services able to process high volumes of satellite data without the need of dedicated desktop software licenses possible. The main objective of the case study is to present and evaluate a methodology for mapping of flooded areas based on MODIS satellite images derived indices and using state-of-the-art geospatial web services. The methodology and the developed platform were tested with data for the historical flood event that affected the Danube floodplain in 2006 in Romania. The results proved that, despite the relative coarse resolution, MODIS data is very useful for mapping the development flooded area in large plain floods. Moreover it was shown, that the possibility to adapt and combine the existing global algorithms for flood detection to fit the local conditions is extremely important to obtain accurate results

    MODIS-based multi-parametric platform for mapping of flood affected areas. Case study: 2006 Danube extreme flood in Romania

    No full text
    Flooding remains the most widely distributed natural hazard in Europe, leading to significant economic and social impact. Earth observation data is presently capable of making fundamental contributions towards reducing the detrimental effects of extreme floods. Technological advance makes development of online services able to process high volumes of satellite data without the need of dedicated desktop software licenses possible. The main objective of the case study is to present and evaluate a methodology for mapping of flooded areas based on MODIS satellite images derived indices and using state-of-the-art geospatial web services. The methodology and the developed platform were tested with data for the historical flood event that affected the Danube floodplain in 2006 in Romania. The results proved that, despite the relative coarse resolution, MODIS data is very useful for mapping the development flooded area in large plain floods. Moreover it was shown, that the possibility to adapt and combine the existing global algorithms for flood detection to fit the local conditions is extremely important to obtain accurate results

    Assessment of Soil Moisture Anomaly Sensitivity to Detect Drought Spatio-Temporal Variability in Romania

    No full text
    This paper will assess the sensitivity of soil moisture anomaly (SMA) obtained from the Soil water index (SWI) product Metop ASCAT, to identify drought in Romania. The SWI data were converted from relative values (%) to absolute values (m3 m−3) using the soil porosity method. The conversion results (SM) were validated using soil moisture in situ measurements from ISMN at 5 cm depths (2015–2020). The SMA was computed based on a 10 day SWI product, between 2007 and 2020. The analysis was performed for the depths of 5 cm (near surface), 40 cm (sub surface), and 100 cm (root zone). The standardized precipitation index (SPI), land surface temperature anomaly (LST anomaly), and normalized difference vegetation index anomaly (NDVI anomaly) were computed in order to compare the extent and intensity of drought events. The best correlations between SM and in situ measurements are for the stations located in the Getic Plateau (Bacles (r = 0.797) and Slatina (r = 0.672)), in the Western Plain (Oradea (r = 0.693)), and in the Moldavian Plateau (Iasi (r = 0.608)). The RMSE were between 0.05 and 0.184. Furthermore, the correlations between the SMA and SPI, the LST anomaly, and the NDVI anomaly were significantly registered in the second half of the warm season (July–September). Due to the predominantly agricultural use of the land, the results can be useful for the management of water resources and irrigation in regions frequently affected by drought
    corecore